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Abstract 

Existing approaches to combine both ad¬ 
ditive and multiplicative neural units ei¬ 
ther use a fixed assignment of operations 
or require discrete optimization to determine 
what function a neuron should perform. This 
leads either to an inefficient distribution of 
computational resources or an extensive in¬ 
crease in the computational complexity of the 
training procedure. 

We present a novel, parameterizable trans¬ 
fer function based on the mathematical con¬ 
cept of non-integer functional iteration that 
allows the operation each neuron performs to 
be smoothly and, most importantly, differen- 
tiablely adjusted between addition and mul¬ 
tiplication. This allows the decision between 
addition and multiplication to be integrated 
into the standard backpropagation training 
procedure. 


1. Introduction 

In commonplace artificial neural networks (ANNs) the 
value of a neuron is given by a weighted sum of its 
inputs propagated through a non-linear transfer func¬ 
tion. For illustration let us consider a simple neural 
network with multidimensional input and multivari¬ 
ate output. The input layer should be called x and 
the outputs y. Then the value of neuron yi is 


y* = o- 



( 1 ) 


where the transfer function is applied element-wise, 
cri{t) = <j{ti). In the context of this paper we will call 
such networks additive ANNs. 

(Hornik et ah, 1989) showed that additive ANNs with 
at least one hidden layer and a sigmoidal transfer func¬ 
tion are able to approximate any function arbitrarily 
well given a sufficient number of hidden units. Even 
though an additive ANN is an universal function ap¬ 
proximator, there is no guarantee that it can approx¬ 
imate a function efficiently. If the architecture is not 
a good match for a particular problem, a very large 
number of neurons is required to obtain acceptable re¬ 
sults. 

(Durbin & Rumelhart, 1989) proposed an alternative 
neural unit in which the weighted summation is re¬ 
placed by a product, where each input is raised to a 
power determined by its corresponding weight. The 
value of such a product unit is given by 


yi = CT 



(3) 


Using laws of the exponential function this can be writ¬ 
ten as Pi = O'[exp Wij logXj)] and thus the values 

of a layer can also be computed efficiently using matrix 
multiplication, i.e. 


y = cr(exp(IUloga;)) (4) 


where exp and log are taken element-wise. Since in 
general the incoming values x can be negative, the 
complex exponential and logarithm are used. Often no 
non-linearity is applied to the output of the product 
unit. 


The typical choice for the transfer function a{t) is the 
sigmoid function a{t) = 1/(1 -|- e“‘) or an approxima¬ 
tion thereof. Matrix multiplication is used to jointly 
compute the values of all neurons in one layer more 
efficiently; we have 

y^(T{Wx) ( 2 ) 


1.1. Hybrid summation-multiplication 
networks 

Both types of neurons can be combined in a hybrid 
summation-multiplication network. Yet this poses the 
problem of how to distribute additive and multiplica¬ 
tive units over the network, i.e. how to determine 
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whether a specific neuron should be an additive or 
multiplicative unit to obtain the best results. A simple 
solution is to stack alternating layers of additive and 
product units, optionally with additional connections 
that skip over a product layer, so that each additive 
layer receives inputs from both the product layer and 
the additive layer beneath it. The drawback of this 
approach is that the resulting uniform distribution of 
product units will hardly be ideal. 

A more adaptive approach is to learn the function of 
each neural unit from provided training data. How¬ 
ever, since addition and multiplication are different op¬ 
erations, until now there was no obvious way to deter¬ 
mine the best operation during training of the network 
using standard neural network optimization methods 
such as backpropagation. An iterative algorithm to de¬ 
termine the optimal allocation could have the follow¬ 
ing structure: For initialization randomly choose the 
operation each neuron performs. Train the network 
by minimizing the error function and then evaluate its 
performance on a validation set. Based on the per¬ 
formance determine a new allocation using a discrete 
optimization algorithm (such as particle swarm opti¬ 
mization or genetic algorithms). Iterate the process 
until satisfactory performance is achieved. The draw¬ 
back of this method is its computational complexity; 
to evaluate one allocation of operations the whole net¬ 
work must be trained, which takes from minutes to 
hours for moderately sized problems. 

Here we propose an alternative approach, where the 
distinction between additive and multiplicative neu¬ 
rons is not discrete but continuous and differentiable. 
Hence the optimal distribution of additive and mul¬ 
tiplicative units can be determined during standard 
gradient-based optimization. Our approach is orga¬ 
nized as follows: First, we introduce non-integer iter¬ 
ates of the exponential function in the real and com¬ 
plex domains. We then use these iterates to smoothly 
interpolate between addition (1) and multiplication 
(3). Finally, we show how this interpolation can be 
integrated and implemented in neural networks. 

2. Iterates of the exponential fnnction 


for n, m G Z, forms an Abelian group. 

Equation (5) cannot be used to define functional it¬ 
eration for non-integer n. Thus, in order to calcu¬ 
late non-integer iterations of function, we have to find 
an alternative definition. The sought generalization 
should also extend the additive property (6) of the 
composition operation to non-integer n, m S K. 

2.2. Abel’s functional equation 

Consider the following functional equation given by 
(Abel, 1826), 


^(/(a;)) ='0(a:)(7) 

with constant /3 S C. We are concerned with f{x) = 
exp(a;). A continuously differentiable solution for /3 = 
1 and X G M is given by 

4’{x) =\og^^\x) + k (8) 

with fc G N s.t. 0 < log^^^(x) < 1. Note that for 
X < 0 we have k = —\ and thus tp is well defined 
on whole K. The function is shown in Fig. 1. Since 
'i/' : K —>■ (—l,oo) is strictly increasing, the inverse 
'ip~^ : (—1, oo) —)• K exists and is given by 

^“^ ('(/') = exp^^^('!/'— fc) (9) 


with A: G N s.t. Q < ip — k < 1. For practical reasons 
we set ip~^{ip) = —oo for ip < —1. The derivative of ip 
is given by 


fe-i 


V’'(a;) = n 




(10a) 


with fc G N s.t. 0 < log^^^(x) < 1 and the derivative of 
its inverse is 


iP ^'{ip) 


k-l 

Y[exp^^'>{ip-'^{ip - j)) 


3=0 


(10b) 


with fc G N s.t. 0 < Ip — k < 1. 


2.2.1. Non-integer iterates using Abel’s 

EQUATION 


2.1. Functional iteration 

Let / : C —>■ C be an invertible function. For n G Z we 
write for the n-times iterated application of /, 

/(")(2;) =/o/o---o/(z). (5) 

n times 

Further let = (Z”^)*-"^ where f~^ denotes the 

inverse of /. We set f^^\z) = z to he the identity 
function. It can be easily verified that functional iter¬ 
ation with respect to the composition operator, i.e. 

jin) Q j(m) _ j(n+m) ^0^ 


By inspection of Abel’s equation (7), we see that the 
nth iterate of the exponential function can be written 
as 

exp^‘^\x) = ip~^ {ip{x) + n) . (11) 

While this equation is equivalent to (5) for integer n, 
we are now also free to choose n G K and thus (11) 
can be seen as a generalization of functional iteration 
to non-integer iterates. It can easily be verified that 
the composition property (6) holds. Hence we can un¬ 
derstand the function (p{x) = exp(^/^)(x) as the func¬ 
tion that gives the exponential function when applied 
to itself, ip is called the functional square root of exp 
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qj{x) 



Figure 1. A continuously differentiable solution ip{x) to 
Abel’s equation (7) for the exponential function in the real 
domain. 



exp<">(x) 


Figure 2. Iterates of the exponential function exp^"'^ (a;) 
for n G { — 1, —0.9,..., 0,..., 0.9,1} obtained using the so¬ 
lution ( 11 ) of Abel’s equation. 


and we have (p((p(x)) = exp(x) for all x G M.. Like¬ 
wise exp^^/^^ is the function that gives the exponential 
function when iterated N times. 

Since n is a continuous parameter in definition (11) 
we can take the derivative of exp with respect to its 
argument as well as n. They are given by 

exp'(")(a;) = + n) {x) 

( 12 a) 

exp("')(a;) = ^S^^ 2 M = ^'-i(^(,c)+n) . ( 12 b) 
on 

Thus ( 8 ) provides a method to interpolate between 
the exponential function, the identity function and the 
logarithm in a continuous and differentiable way. 


with constant 7 S C. As before we are interested in 
solutions of this equation for f(x) = exp(a:); we have 

X(exp(2)) = 7X(z) (14) 

but now we are considering the complex exp : C —f C. 

The complex exponential function is not injective, 
since 

exp(z -b 27rm) = exp(z) n G Z . 

Thus the imaginary part of the codomain of its inverse, 
i.e. the complex logarithm, must be restricted to an 
interval of size 27r. Here we define log : C —f {z £ 
C : /3 < Imz < /3 + 27r} with /3 € M. For now let us 
consider the principal branch of the logarithm, that is 
P = —TT. 

To derive a solution, we examine the behavior of exp 
around one of its fixed points. A fixed point of a func¬ 
tion / is a point c with the property that /(c) = c. 
The exponential function has an infinite number of 
fixed points. Here we select the fixed point closest to 
the real axis in the upper complex half plane. Since 
log is a contraction mapping, according to the Banach 
fixed-point theorem (Khamsi & Kirk, 2001) the fixed 
point of exp can be found by starting at an arbitrary 
point z G C with Imz > 0 and repetitively applying 
the logarithm until convergence. Numerically we find 

exp(c) = c « 0.318132 -b 1.33724 f 

where i = V—l is the imaginary unit. 

Close enough to c the exponential function behaves 
like an affine map. To show this, let z' = z — c and 
consider 

exp(c -b z') — c = exp(c) exp(z') — c = c [exp(z') — 1 ] 
= c[l + z' + 0 (|zf)-l] 

= cz' + 0 (|zf). 

Here we used the Taylor expansion of the exponential 
function, exp(z') = 1-b z'-b 0(z'^). Thus for any point 
z in a circle of radius rg around c, we have 

exp(z) = cz -b c — c^ -b O(rQ). (15) 


2.3. Schroder’s functional equation 

Motivated by the necessity to evaluate the logarithm 
for negative arguments, we derive a solution of Abel’s 
equation for the complex exponential function. Apply¬ 
ing the substitution 

= -^\ogx{.x) 

log 7 

in Abel’s equation (7) gives a functional equation first 
examined by (Schroder, 1870), 

(13) 


By substituting this approximation into (14) it be¬ 
comes apparent that a solution to Schroder’s equation 
around c is given by 

x(z) = z — c for |z — c| < ro (16) 

where we have set 7 = c. 

We will now compute the continuation of the solution 
to points outside the circle around c. From (14) we 
obtain 

X( 2 r) =cx(log(z)). (17) 

If for a point z € C repeated application of the log¬ 
arithm leads to a point inside the circle of radius rg 


X{f{z)) =7X(z) 
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Im 



Figure 3. Calculation of x(^)- Starting from point zo = z 
the series 2 „+i = log 2 „ is evaluated until |z„ — c| < ro 
for some n. Inside this circle of radius vq the function 
value can then be evaluated using x{^) = c"[zn — c). The 
contours are generated by iterative application of exp to 
the circle of radius ro around c. Near its fixed point the 
exponentiation behaves like a scaling by |c| « 1.374 and a 
rotation of Imc « 76.6° around c. 

around c, we can obtain the function value of x(-z) 
from (16) via iterated application of (17). In the next 
section it will be shown that this is indeed the case for 
nearly every z C C. Hence the solution to Schroder’s 
equation is given by 

X{z) = {\og^^\z) - c) (18) 

with k = minfc/gN k' s.t. | log^^ ^(z) — c| < rg. Solving 
for z gives 



-4 -2 0 2 4 Re 


Figure 4. Domain coloring plot of x{z)- Discontinuities 
arise at 0 , 1 , e, e°,... and stretch into the negative complex 
half-plane. They are caused by log being discontinuous at 
the polar angles /3 and j3 + 2ir. 


drawback that iterated application of log starting from 
a point on the lower complex half-plane will converge 
to the complex conjugate c instead of c. Thus x(z) 
would be undefined for Imz < 0 . 

To avoid this problem, we use the branch defined by 
log :C—>'{zGC:/3<Imz</3 + 27r} with —1 < /3 < 
0. Using such a branch the series z„, where 

Zn+l = log Zn , 

converges to c, provided that there is no n such that 
z„ = 0. Thus X is defined on C \ U where D = 
{ 0 ,e,eUe®°,...}. 


X ^(x) = exp(*’i(c '"x + c) (19) 

with k = minfc'gN s.t. |c“* xl ^ ^o- Obviously 
we have X~^{x{z)) = z for all z S C. However 
X(X"^(?)) =■? only holds if Im (c"'=^ + c) G [/3,/3 + 27r). 

The derivative of x is given by 


k-l 


x'{z) = n 


j=0 


log 


U) 


( 20 a) 


with k = minfe'gNU s.t. | log^^ ^(z) — c| < rg and we 
have 




with k = minfe/gN k' s.t. |c ^ xl ^ ^o- 


2.3.1. The solution x is defined on almost C 

The principal branch of the logarithm, i.e. restrict¬ 
ing its imaginary part to the interval [—7r,7r), has the 


Proof. If Imz„ > 0 then argz„ G [ 0 , 7 r] and thus 
Imz„-|_i > 0. Hence, if we have Imz„ >0 for n G N, 
then Imz„/ > 0 for all n' > n. Now, consider the 
conformal map 



which maps the upper complex half-plane to the unit 
disk and define the series ^n+i = C(Cn) with 

c(0 = e(iogr'(0) • 

We have C, : Di ^ Di, where Hi = {t G C : |t| < 1} is 
the unit disk; furthermore (((0) = 0. Thus by Schwarz 
lemma |C(1)| < |1| for all t G Hi (since Q(t) ^ Xt with 
A G C) and hence lim„_).oo Cn = 0 (Kneser, 1950). This 
implies lim„^oo Zn = c. 

On the other hand, if Im z„ <0 and Re z„ < 0, then 
Im logz„ > 0 and z„ converges as above. Finally, if 
Imz„ < 0 and Rez„ > 0, then, using —1 < /I, we have 
Rez„+i < I logz„| < 1 + log(Rez„) < Rez„ and thus 
at some element n' in the series we will have Re z„' < 1 
which leads to Re z„'-|-i < 0. □ 
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Rez ReX(z) 

Figure 5. Structure of the function x{^) calculation 
of exp*'”^(l + 7ri) for n £ [0,1] in z-space (left) and x-space 
(right). The uniform grid with the cross at the origin is 
mapped using x(^)- The points 1 + tti and ^ = x(l + tt*) 
are shown as hollow magenta circles. The black line shows 
X~'^(c”0 c" ^ for n £ [0,1]. The blue and purple 
points are placed at n = 1/2 and n = 1 respectively. 

2.3.2. Non-integer iterates using Schroder’s 

EQUATION 

Repetitive application of Schroder’s equation (13) on 
an iterated function (5) leads to 

x{&\z)) = rx{z)- ( 21 ) 

Thus the nth iterate of the exponential function on the 
whole complex plane is given by 

exp(")(^) = X-i(c"x(z)) (22) 

where x(z) and x~^(z) are given by (18) and (19) re¬ 
spectively. Since x is injective we can think of it as 
a mapping from the complex plane, called z-plane, to 
another complex plane, called y-plane. By (22) the 
operation of calculating the exponential of a number y 
in the z-plane corresponds to complex multiplication 
by factor c of xiv) in Ih® y-plane. This is illustrated 
in Fig. 5. Samples from expi”i are shown in Fig. 7. 

While the definition for expi"i given by (22) can be 
evaluated on the whole complex plane C, it only has 
meaning as a non-integer iterate of exp, if composi¬ 
tion exp(”i[exp™(z)] = expi"+™i(z) holds. Since this 
requires that x(x“^(C)) = let us define the sets 
= {^ G C : x[x“^(c'"^)] = G [—1,1]} and 

£ = x~^(£')- Then, for 2; G f , n G K and m G [—1,1], 
the composition of the iterated exponential function is 
given by 

exp(") [expl™) (z)] = x-i [c" x(x-i(c“ x(z)))] 

= X-'[c"+”^x(^)] =exp(-+-)(z) 

and the composition property is satished. The subset 
£ of the complex plane where composition of expl"l 
for non-integer n is shown in figure 6. 

The derivatives of expl”l defined using Schroder’s 
equation are given by 

exp'(")(z) = c” x'-^c^xiz)] x'iz) (23a) 

exp("')(z) = c” x'-^c^xiz)] X(z} log(c). (23b) 



Re 2 

Figure 6. Function composition holds in the gray- 
shaded area for non-integer iteration numbers, i.e. 
exp*"! o expl’"l( 2 ) = exp^"'^"'\z) for n £ R and m £ 
[—1,1]. We defined log such that Im log 2 : £ [-1,-1 + 27r]. 

Re exp'”^(z) Im exp*"^(2) 




Figure 7. Iterates of the exponential function expl"' {x -I- 
0.5i) for n £ {0, 0.1,..., 0.9,1} (upper plots) and n £ 
{0, —0.1,..., —0.9, —1} (lower plots) obtained using the so¬ 
lution (11) of Schroder’s equation. Exp, log and the iden¬ 
tity function are highlighted in orange. 

Hence we defined the continuously differentiable func¬ 
tion exp I"! : C \ D —>■ C on almost the whole complex 
plane and showed that it has the meaning of a non¬ 
integer iterate of exp on the subset £. 

3. Interpolation between addition and 
mnltiplication 

Using fundamental properties of the exponential func¬ 
tion we can write every multiplication of two numbers 
cc, 2 / G K as 

xy = exp(log X + log y) = exp(exp“^ x + exp“^ y). 

We define the operator ©„ for a;, y G K and n G K as 

a; ®n y = exp^”^ ^exp*^“"^(a;)-I-exp^“”^(y)^ . (24) 

Note that we have a; ©0 y = a: + y and a; ©1 y = xy. 
Thus for 0 < n < 1 the above operator continuously 
interpolates between the elementary operations of ad¬ 
dition and multiplication. We will refer to ©„ as the 
“addiplication operator”. Analogous to the n-ary sum 
and product we will employ the following notation for 
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2e„7 



Figure 8. The interpolation between addition and multi¬ 
plication using (24) with x — 2 and y = 7. The iter¬ 
ates of the exponential function are either calculated using 
Abel’s equation (blue) or Schroder’s equation (orange). In 
both cases the interpolated values exceed the range be¬ 
tween 2 + 7 = 9 and 2 • 7 = 14 and therefore a local maxi¬ 
mum exists. 


the n-ary addiplication operator, 

K 

Xj = X}^ 0y2 * * * 0^ Xx • 

j=k 


The derivative of the addiplication operator w.r.t. its 
operands and the interpolation parameter n are calcu¬ 
lated using the chain rule. Using the shorthand 

E = exp(-") (x) + exp(-") (y) (26a) 


we have 


dx 


=exp"">(E)eip"->(pt 

2fe|^=exp<"')(i5) + 

on 


(26b) 

(26c) 


(26d) 

exp'(")(U) • [exp(-”')(x) + exp(-”')(y) 


For positive arguments x,y > 0 we can use the it¬ 
erates of exp based either on the solution of Abel’s 
equation (11) or Schroder’s equation (22). However, 
if we also want to deal with negative arguments, we 
must use iterates of exp based on Schroder’s equation 
(22), since the real logarithm is only defined for pos¬ 
itive arguments. From the exemplary addiplication 
shown in Fig. 8 we can see that the interpolations pro¬ 
duced by these two methods are not monotonic func¬ 
tions w.r.t. the interpolation parameter n. In both 
cases local maxima exist; however interpolation based 
on Schroder’s equation has higher extrema in this case 
and also in general (personal experiments). It is well 
known, that the existence of local extrema can pose a 
problem for gradient-based optimizers. 


ing a continuous parameter. 

The straightforward approach is to use neurons that 
use addiplication instead of summation, i.e. the value 
of neuron yi is given by 



= a 


exp' 


^IU„ exp( 



(27) 


For Hi = 0 the neuron behaves like an additive neu¬ 
ron and for = 1 it computes the product of its 
inputs. Because we sum over exp*^“"‘) (xj) which has 
dependence on the parameter Ui of neuron y^, this cal¬ 
culation corresponds to a network in which each neu¬ 
ron in layer x has separate outputs for each neuron in 
the following layer y; see Fig. 9a. Compared to con¬ 
ventional neural nets this architecture has only one 
additional real-valued parameter per neuron (jii) but 
also poses a significant increase in computational com¬ 
plexity due to the necessity of separate outputs. Since 
exp(“"'^(xj) is complex it might be sensible (but is not 
required) to allow a complex weight matrix Wij. 

The computational complexity of separate output 
units can be avoided by calculating the value of a neu¬ 
ron according to 


Vi = 0- 


exp' 




exp^ 


\xj) 


(28) 


This corresponds to the architecture shown in Fig. 9b. 
The interpolation parameter Ui has been split into 
a pre-transfer-function part fiy^ and a post-transfer- 
function part fixj ■ Since fiy^ and n^. are not tied 
together, the network is free to implement arbitrary 
combinations of iterates of the exponential function. 
Addiplication occurs as the special case fiy^ = ■ 

Compared to conventional neural nets each neuron 
has two additional parameters, namely fiy^ and fixj ; 
however the asymptotic computational complexity of 
the network is unchanged. In fact, this architecture 
corresponds to a conventional, additive neural net, as 
defined by (1), with a neuron-dependent, parameter- 
izable transfer function. For neuron Zi the transfer 
function given by 


CTz- (t) = exp 




a(exp(”- 



(29) 


4. Neurons that can add or multiply 

We propose two methods to construct neural nets that 
have units the operation of which can be adjusted us- 


Consequently, implementation in existing neural net¬ 
work frameworks is possible by replacing the standard 
sigmoidal transfer function with this function and op¬ 
tionally using a complex weight matrix. 
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Figure 9. Two proposed neural network architectures 
that can implement addiplication. (a) Neuron yi cal¬ 
culates its value according to (27). We have yi = 
cr(yi) and the subunits compute Xij = exp^ and 

yi = exp^”'^ WijXij^. The weights between the sub¬ 
units are shared, (b) Neuron yi calculates its value accord¬ 
ing to (28). We have yi = o{jji) and the subunits compute 
Xj = ey.p''~‘^^\xj) and iji = exp^"’^ WijXj'^. 

5. Applications 

5.1. Variable pattern shift 

Consider the dynamic pattern shift task shown in 
Fig. 10a: The input consists of a binary vector a; of V 
elements and an integer m G {0,1) ■ • ■) The 

desired output y is x circularly shifted by m elements 
to the right, 

Vn ^n—m j 

where x is indexed modulo iV, i.e. Xn-m rolls over to 
the right if n — m < 0. 


Figure 10. (a) Variable pattern shift problem. Given a 
random binary pattern x G and an integer m G 
{0,1,..., A— 1} presented in one-hot encoding, the learner 
should output the pattern x circularly shifted to the right 
by m grid cells, (b) A neural net with two hidden layers 
that can solve this problem by employing the Fourier shift 
theorem. The first hidden layer is additive, the second is 
multiplicative; the output layer is additive. All neurons use 
linear transfer functions. The hrst hidden layer computes 
the DFT of the input pattern x and shift mount s. The 
second hidden layer applies the Fourier shift theorem and 
the output layer computes the inverse DFT of the shifted 
pattern. 

If we encode the shift amount m as a one-hot vector 
s of length N, i.e. Sj = 1 if j = m else Sj = 0, we can 
further rewrite this as 

7V-1 

2/. = (30a) 


A method to architecturally efficiently implement this 
task in a neural architecture is based on the shift 
theorem of the discrete Fourier transform (DFT) 
(Brigham, 1988). Let T{{xn})k denote the A:th ele¬ 
ment of the DFT of a; = {xq, ..., a;jv_i}. By definition 
we have 

N-l 

^{{Xn})k = X! 

n—0 

and its inverse is given by 

AT-l 

k=0 

The shift theorem states that a shift by m elements in 
the time domain is equivalent to a multiplication by 
factor |.]^g frequency domain, 

J^i{Xn-m})k = J^{{Xn})k . 


Hence the shifted pattern can be calculated using 
y = J-i({J-({x„})fee-2-T}). 


Using the above definitions its uth component is given 
by 



N-l 



A;=0 



N-l 


n=0 


N-l N-l 

m—O n—0 

(30b) 

This corresponds to a neural network with two hidden 
layers (one additive, one multiplicative) and an addi¬ 
tive output layer as shown in Fig. 10b. The optimal 
weights of this network are given by the corresponding 
coefficients from (30). 

From this example we see that having the ability to 
automatically determine the function of each neuron 
is crucial to learn neural nets that are able to solve 
complex problems. 

6. Conclusion and future work 

We proposed one method to continuously and differ- 
entiably interpolate between addition and multiplica¬ 
tion and showed how it can be integrated into neural 
networks by replacing the standard sigmoidal transfer 
function with a parameterizable transfer function. In 
this paper we presented the mathematical formulation 
of these concepts and showed how to integrate them 
into neural networks. 

We will perform simulations to see how these neu¬ 
ral nets behave given real-world problems and how to 
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train them most efficiently. 

While working on this theory, we already have two 
possible improvements in mind: 

1. Our interpolation technique is based on non¬ 
integer iterates of the exponential function cal¬ 
culated using Abel’s and Schroder’s functional 
equations. We chose this method for our first 
explorations because it has the mathematically 
sound property that the calculated iterates form 
an Abelian group under functional composition. 
However it results in a non-monotonic interpola¬ 
tion between addition and multiplication which 
may lead to a challenging optimization landscape 
during training. Therefore we will try to find more 
ad-hoc interpolations with a monotonic transition 
between addition and multiplication. 

2. Our method introduces one or two additional real¬ 
valued parameters per neuron for the transfer 
function. Using a suitable fixed transfer function 
might allow to absorb these parameters back into 
the bias. 

While the specific implementation details proposed in 
this work may have their drawbacks, we believe that 
neurons which can implement operations beyond ad¬ 
dition are a key to new areas of application for neural 
computation. 
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